Week 5: Dimensionality reduction techniques

This weeks data is part of the UN Development Programme, Human Development Reports, which you can find here The data is combined from two different data sets: Human Development Index and its components and Gender Inequality Index

At first I will load the ‘human’ data and then explore it´s structure and dimensions.

Loading the data

setwd("/Users/mirva/IODS-project/data")
human <- read.table("http://s3.amazonaws.com/assets.datacamp.com/production/course_2218/datasets/human2.txt", sep=",", header=TRUE)

Structure and dimensions

str(human)
## 'data.frame':    155 obs. of  8 variables:
##  $ Edu2.FM  : num  1.007 0.997 0.983 0.989 0.969 ...
##  $ Labo.FM  : num  0.891 0.819 0.825 0.884 0.829 ...
##  $ Edu.Exp  : num  17.5 20.2 15.8 18.7 17.9 16.5 18.6 16.5 15.9 19.2 ...
##  $ Life.Exp : num  81.6 82.4 83 80.2 81.6 80.9 80.9 79.1 82 81.8 ...
##  $ GNI      : int  64992 42261 56431 44025 45435 43919 39568 52947 42155 32689 ...
##  $ Mat.Mor  : int  4 6 6 5 6 7 9 28 11 8 ...
##  $ Ado.Birth: num  7.8 12.1 1.9 5.1 6.2 3.8 8.2 31 14.5 25.3 ...
##  $ Parli.F  : num  39.6 30.5 28.5 38 36.9 36.9 19.9 19.4 28.2 31.4 ...
dim(human)
## [1] 155   8

As you can see there are 155 observations and 8 variables in this data. The variables are as follows: * Edu2.FM = The ratio of female and male populations with secondary education in each country (female/ male) * Labo.FM = The ratio of labour force participation of females and males in each country (female/male) * Life.Exp = Life expectancy at birth (years) * Edu.Exp = Expected years of schooling * GNI = Gross national income (GNI) per capita * Mat.Mor = Maternal mortality ratio (deaths per 100 000 live births) * Ado.Birth = Adolescent birth rate (births per 1 000 women ages 15–19) * Parli.F = Share of seats in parliament (% held by women)

Now that you know what the variables used here are let´s look at the graphics and summaries of the data.

The graphical overview of the data

library(GGally); library(dplyr); library(corrplot)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:GGally':
## 
##     nasa
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
ggpairs(human)

cor(human) %>% corrplot()

As you can see from the correlation plots above some correlations are quite strong and some are very minimal. Strongest correlations we can find are between Mat.Mor and Life.Exp (-.857), Edu.Exp and Life.Exp (r=.789), Mat.Mor and Ado.Birth (r=.759), Life.Exp and GNI (r=.627) and Edu.Exp and GNI (r=.624). Weakest correlations are between Edu2.FM and Labo.FM (r=.00956), Labo.FM and GNI (r=-.0217), Labo.FM and Edu.Exp (r=.0473), Ado.Birth and Parli.F (r=-.0709) and Parli.F and Edu2.FM (r=.0786).

Summaries of the variables in the data

summary(human)
##     Edu2.FM          Labo.FM          Edu.Exp         Life.Exp    
##  Min.   :0.1717   Min.   :0.1857   Min.   : 5.40   Min.   :49.00  
##  1st Qu.:0.7264   1st Qu.:0.5984   1st Qu.:11.25   1st Qu.:66.30  
##  Median :0.9375   Median :0.7535   Median :13.50   Median :74.20  
##  Mean   :0.8529   Mean   :0.7074   Mean   :13.18   Mean   :71.65  
##  3rd Qu.:0.9968   3rd Qu.:0.8535   3rd Qu.:15.20   3rd Qu.:77.25  
##  Max.   :1.4967   Max.   :1.0380   Max.   :20.20   Max.   :83.50  
##       GNI            Mat.Mor         Ado.Birth         Parli.F     
##  Min.   :   581   Min.   :   1.0   Min.   :  0.60   Min.   : 0.00  
##  1st Qu.:  4198   1st Qu.:  11.5   1st Qu.: 12.65   1st Qu.:12.40  
##  Median : 12040   Median :  49.0   Median : 33.60   Median :19.30  
##  Mean   : 17628   Mean   : 149.1   Mean   : 47.16   Mean   :20.91  
##  3rd Qu.: 24512   3rd Qu.: 190.0   3rd Qu.: 71.95   3rd Qu.:27.95  
##  Max.   :123124   Max.   :1100.0   Max.   :204.80   Max.   :57.50

As we can see from the summaries and also from the upper of the correlation plots above the median of the Edu2.FM is around 0.9375 so the ratio of female and male populations with secondary education is almost 1 meaning almost equal ratio of education. But in some country the minimum is 0.1717 meaning that the women get 2nd education rarely compared to men. The maximum here is 1.4967 meaning that in some countries women are more educated than men.

If we look at the summary of the Labo.FM median is 0.7535 so the men are in labour force more often than women. In many countries women are at home taking care of household and it might explain something of this result. It is interesting that the maximum here is only 1.0380 which tells that women in average don´t work more often than men.

Edu.Exp seems to be quite normally distributed and the median here is 13.50 (expected years of schooling). The 1st and 3rd quartiles are quite near this but the minimum is only 5.40 telling that some of people won´t probably go to school at all or just for few years. Maximum 20.20 years is also quite high meaning that in some countries almost everyone gets at least 2nd education and even higher.

The median of Life.Exp is 74.20 years. Quite shockingly the minimum here is only 49 years which might be a consequense for example of high child mortality. The maximum is 83.50 years so in some country the life expectancy is almost two times larger than in the country where it is 49 years.
As we can see in GNI the range is huge. The minimum is only 581$ and the maximum 123 124$ while the median is 12 040$. We can see from the graph also that GNI is far from uniform (or even normal) distribution and therefore is quite inequal.

Almost the same distribution is true with the Mat.Mor but here the minimum is 1 death, median 49 deaths and maximum 1100 deaths and 50% is between 11.5 and 190 deaths. So this means that in some country there is only 1 death per 100 000 life births (0.001%) and in some country 1100 deaths (1.1%) so the differences are very big but in most countries the death rate is quite low.

The median of the Ado.Birth is surplisingly high, 33.60 which is 3.36% of womean ages 15-19. Also the range is quite big; minimum is 0.60 and maximum 204.80.

Parli.F tells much of the equality between the genders. The minimum here is 0 meaning that in some country there are no women in parliament. The maximum is 57.50 so in some country there are more women in parliament than men. The median is 19.30 so in most countries women are under represented in parliaments compared to men.

Principal component analysis (PCA)

Because there are 8 variables it would be hard to interpret the relations between them if we could use only pairwise comparisons. That´s why I will use principal component analysis (PCA) which reduces variables to combinations of the data. I will perform the PCA first on the non standardized human data and then on the standardized human data

PCA on non standardized data

pca1_human <- prcomp(human)
s1 <- summary(pca1_human)
s1
## Importance of components:
##                              PC1      PC2   PC3   PC4   PC5   PC6    PC7
## Standard deviation     1.854e+04 185.5219 25.19 11.45 3.766 1.566 0.1912
## Proportion of Variance 9.999e-01   0.0001  0.00  0.00 0.000 0.000 0.0000
## Cumulative Proportion  9.999e-01   1.0000  1.00  1.00 1.000 1.000 1.0000
##                           PC8
## Standard deviation     0.1591
## Proportion of Variance 0.0000
## Cumulative Proportion  1.0000
pca1_pr <- round(100*s1$importance[2,], digits = 1) 
pca1_pr
## PC1 PC2 PC3 PC4 PC5 PC6 PC7 PC8 
## 100   0   0   0   0   0   0   0
pc1_lab <- paste0(names(pca1_pr), " (", pca1_pr, "%)")
biplot(pca1_human, cex = c(0.8, 1), col = c("grey40", "deeppink2"), xlab = pc1_lab[1], ylab = pc1_lab[2])
## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

## Warning in arrows(0, 0, y[, 1L] * 0.8, y[, 2L] * 0.8, col = col[2L], length
## = arrow.len): zero-length arrow is of indeterminate angle and so skipped

PCA on standardized data

human_std <- scale(human)
summary(human_std)
##     Edu2.FM           Labo.FM           Edu.Exp           Life.Exp      
##  Min.   :-2.8189   Min.   :-2.6247   Min.   :-2.7378   Min.   :-2.7188  
##  1st Qu.:-0.5233   1st Qu.:-0.5484   1st Qu.:-0.6782   1st Qu.:-0.6425  
##  Median : 0.3503   Median : 0.2316   Median : 0.1140   Median : 0.3056  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.5958   3rd Qu.: 0.7350   3rd Qu.: 0.7126   3rd Qu.: 0.6717  
##  Max.   : 2.6646   Max.   : 1.6632   Max.   : 2.4730   Max.   : 1.4218  
##       GNI             Mat.Mor          Ado.Birth          Parli.F       
##  Min.   :-0.9193   Min.   :-0.6992   Min.   :-1.1325   Min.   :-1.8203  
##  1st Qu.:-0.7243   1st Qu.:-0.6496   1st Qu.:-0.8394   1st Qu.:-0.7409  
##  Median :-0.3013   Median :-0.4726   Median :-0.3298   Median :-0.1403  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.: 0.3712   3rd Qu.: 0.1932   3rd Qu.: 0.6030   3rd Qu.: 0.6127  
##  Max.   : 5.6890   Max.   : 4.4899   Max.   : 3.8344   Max.   : 3.1850
pca2_human <- prcomp(human_std)
s2 <- summary(pca2_human)
s2
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6
## Standard deviation     2.0708 1.1397 0.87505 0.77886 0.66196 0.53631
## Proportion of Variance 0.5361 0.1624 0.09571 0.07583 0.05477 0.03595
## Cumulative Proportion  0.5361 0.6984 0.79413 0.86996 0.92473 0.96069
##                            PC7     PC8
## Standard deviation     0.45900 0.32224
## Proportion of Variance 0.02634 0.01298
## Cumulative Proportion  0.98702 1.00000
pca2_pr <- round(100*s2$importance[2,], digits = 1) 
pca2_pr
##  PC1  PC2  PC3  PC4  PC5  PC6  PC7  PC8 
## 53.6 16.2  9.6  7.6  5.5  3.6  2.6  1.3
pc2_lab <- paste0(names(pca2_pr), " (", pca2_pr, "%)")
biplot(pca2_human, cex = c(0.8, 0.7), col = c("grey40", "deeppink2"), xlab = pc2_lab[1], ylab = pc2_lab[2])

As you can see the plots created are very different. If we look at first at the PCA on non standardized data we can see that the first component PC1 explains 99,99% of the variation and PC2 0,01% so as we can see from the cumulative proportion together they explain 100% of the variation. Because there are different units of measure in the data variances are quite different and the PCA is really hard to interpret on non standardized data.

If we then look at the PCA on standardized data we can see that there are more components with some proportions and 100% is acquired not until PC8. PC1 and PC2 together explain 69,84% of the variaton. Also the plot is now easier to interpret. If we look at the arrows we can see that there are quite strong correlations between Edu.Exp, Life.Exp, Edu2.FM and GNI and also between Mat.Mor and Ado.Birth as we already noticed earlier. These 6 variables also correlate strongly with PC1. Parli.F and Labo.FM correlate most with each other and PC2. We can also see that for example in Mozambique and Burundi the ratio of labour force participation of females and males in each country and share of seats in parliament are quite high and also the maternal mortality ratio and adolescent birth ratio. In Mozambique and Burundi the ratio of female and male populations with secondary education, expected years of schooling, life expectancy at birth and gross national income (GNI) per capita are quite low. For example Iran has quite opposite pattern.

Tea!

At first I will load the tea dataset from the FactoMineR package

library(FactoMineR); library(tidyr); library(ggplot2)
data("tea")
str(tea)
## 'data.frame':    300 obs. of  36 variables:
##  $ breakfast       : Factor w/ 2 levels "breakfast","Not.breakfast": 1 1 2 2 1 2 1 2 1 1 ...
##  $ tea.time        : Factor w/ 2 levels "Not.tea time",..: 1 1 2 1 1 1 2 2 2 1 ...
##  $ evening         : Factor w/ 2 levels "evening","Not.evening": 2 2 1 2 1 2 2 1 2 1 ...
##  $ lunch           : Factor w/ 2 levels "lunch","Not.lunch": 2 2 2 2 2 2 2 2 2 2 ...
##  $ dinner          : Factor w/ 2 levels "dinner","Not.dinner": 2 2 1 1 2 1 2 2 2 2 ...
##  $ always          : Factor w/ 2 levels "always","Not.always": 2 2 2 2 1 2 2 2 2 2 ...
##  $ home            : Factor w/ 2 levels "home","Not.home": 1 1 1 1 1 1 1 1 1 1 ...
##  $ work            : Factor w/ 2 levels "Not.work","work": 1 1 2 1 1 1 1 1 1 1 ...
##  $ tearoom         : Factor w/ 2 levels "Not.tearoom",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ friends         : Factor w/ 2 levels "friends","Not.friends": 2 2 1 2 2 2 1 2 2 2 ...
##  $ resto           : Factor w/ 2 levels "Not.resto","resto": 1 1 2 1 1 1 1 1 1 1 ...
##  $ pub             : Factor w/ 2 levels "Not.pub","pub": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Tea             : Factor w/ 3 levels "black","Earl Grey",..: 1 1 2 2 2 2 2 1 2 1 ...
##  $ How             : Factor w/ 4 levels "alone","lemon",..: 1 3 1 1 1 1 1 3 3 1 ...
##  $ sugar           : Factor w/ 2 levels "No.sugar","sugar": 2 1 1 2 1 1 1 1 1 1 ...
##  $ how             : Factor w/ 3 levels "tea bag","tea bag+unpackaged",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ where           : Factor w/ 3 levels "chain store",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ price           : Factor w/ 6 levels "p_branded","p_cheap",..: 4 6 6 6 6 3 6 6 5 5 ...
##  $ age             : int  39 45 47 23 48 21 37 36 40 37 ...
##  $ sex             : Factor w/ 2 levels "F","M": 2 1 1 2 2 2 2 1 2 2 ...
##  $ SPC             : Factor w/ 7 levels "employee","middle",..: 2 2 4 6 1 6 5 2 5 5 ...
##  $ Sport           : Factor w/ 2 levels "Not.sportsman",..: 2 2 2 1 2 2 2 2 2 1 ...
##  $ age_Q           : Factor w/ 5 levels "15-24","25-34",..: 3 4 4 1 4 1 3 3 3 3 ...
##  $ frequency       : Factor w/ 4 levels "1/day","1 to 2/week",..: 1 1 3 1 3 1 4 2 3 3 ...
##  $ escape.exoticism: Factor w/ 2 levels "escape-exoticism",..: 2 1 2 1 1 2 2 2 2 2 ...
##  $ spirituality    : Factor w/ 2 levels "Not.spirituality",..: 1 1 1 2 2 1 1 1 1 1 ...
##  $ healthy         : Factor w/ 2 levels "healthy","Not.healthy": 1 1 1 1 2 1 1 1 2 1 ...
##  $ diuretic        : Factor w/ 2 levels "diuretic","Not.diuretic": 2 1 1 2 1 2 2 2 2 1 ...
##  $ friendliness    : Factor w/ 2 levels "friendliness",..: 2 2 1 2 1 2 2 1 2 1 ...
##  $ iron.absorption : Factor w/ 2 levels "iron absorption",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ feminine        : Factor w/ 2 levels "feminine","Not.feminine": 2 2 2 2 2 2 2 1 2 2 ...
##  $ sophisticated   : Factor w/ 2 levels "Not.sophisticated",..: 1 1 1 2 1 1 1 2 2 1 ...
##  $ slimming        : Factor w/ 2 levels "No.slimming",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ exciting        : Factor w/ 2 levels "exciting","No.exciting": 2 1 2 2 2 2 2 2 2 2 ...
##  $ relaxing        : Factor w/ 2 levels "No.relaxing",..: 1 1 2 2 2 2 2 2 2 2 ...
##  $ effect.on.health: Factor w/ 2 levels "effect on health",..: 2 2 2 2 2 2 2 2 2 2 ...
dim(tea)
## [1] 300  36

There are 300 observations and 36 variables in this data.

I choose to keep the following columns: tea.time, tearoom, Tea, How, sugar, how and where.

keep_columns <- c("tea.time", "tearoom", "Tea", "How", "sugar", "how", "where")
tea_time <- dplyr::select(tea, one_of(keep_columns))
summary(tea_time)
##          tea.time          tearoom           Tea         How     
##  Not.tea time:131   Not.tearoom:242   black    : 74   alone:195  
##  tea time    :169   tearoom    : 58   Earl Grey:193   lemon: 33  
##                                       green    : 33   milk : 63  
##                                                       other:  9  
##       sugar                     how                       where    
##  No.sugar:155   tea bag           :170   chain store         :192  
##  sugar   :145   tea bag+unpackaged: 94   chain store+tea shop: 78  
##                 unpackaged        : 36   tea shop            : 30  
## 
str(tea_time)
## 'data.frame':    300 obs. of  7 variables:
##  $ tea.time: Factor w/ 2 levels "Not.tea time",..: 1 1 2 1 1 1 2 2 2 1 ...
##  $ tearoom : Factor w/ 2 levels "Not.tearoom",..: 1 1 1 1 1 1 1 1 1 2 ...
##  $ Tea     : Factor w/ 3 levels "black","Earl Grey",..: 1 1 2 2 2 2 2 1 2 1 ...
##  $ How     : Factor w/ 4 levels "alone","lemon",..: 1 3 1 1 1 1 1 3 3 1 ...
##  $ sugar   : Factor w/ 2 levels "No.sugar","sugar": 2 1 1 2 1 1 1 1 1 1 ...
##  $ how     : Factor w/ 3 levels "tea bag","tea bag+unpackaged",..: 1 1 1 1 1 1 1 1 2 2 ...
##  $ where   : Factor w/ 3 levels "chain store",..: 1 1 1 1 1 1 1 1 2 2 ...

As you can see from the summary there are the following options in the chosen variables: * tea.time: not tea time or tea time * tearoom: not in a tearoom or in a tearoom * Tea: black, earl grey or green * How: alone, with lemon, milk or other * sugar: no sugar or with sugar * how: with a tea bag, with a tea bag and loose leaf (unpackaged) or only loose leaf * where: from a chain store, from a chain store and a tea shop or only from a tea shop

gather(tea_time) %>% ggplot(aes(value)) + facet_wrap("key", scales = "free") + geom_bar() + theme(axis.text.x = element_text(angle = 45, hjust = 1, size = 8))
## Warning: attributes are not identical across measure variables; they will
## be dropped

From the barplots we can see that most of the people have tea time, don´t drink at tearoom, drink earl grey, drink tea alone without any lemon or milk, use sugar, use only tea bags and buy their tea from a chain store.

Multiple Correspondence Analysis

I will use Multiple Correspondence Analysis (MCA) to inspect the connections between the variables I chose

mca <- MCA(tea_time, graph = FALSE)
summary(mca)
## 
## Call:
## MCA(X = tea_time, graph = FALSE) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6
## Variance               0.280   0.233   0.185   0.160   0.148   0.141
## % of var.             16.335  13.580  10.802   9.327   8.633   8.213
## Cumulative % of var.  16.335  29.915  40.717  50.044  58.677  66.890
##                        Dim.7   Dim.8   Dim.9  Dim.10  Dim.11  Dim.12
## Variance               0.131   0.122   0.101   0.089   0.072   0.053
## % of var.              7.635   7.100   5.880   5.205   4.191   3.099
## Cumulative % of var.  74.525  81.625  87.505  92.711  96.901 100.000
## 
## Individuals (the 10 first)
##                 Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3    ctr
## 1            | -0.525  0.328  0.256 |  0.021  0.001  0.000 | -0.242  0.105
## 2            | -0.353  0.148  0.082 | -0.076  0.008  0.004 | -0.622  0.696
## 3            | -0.295  0.104  0.140 | -0.148  0.031  0.035 | -0.284  0.145
## 4            | -0.680  0.551  0.645 | -0.105  0.016  0.015 |  0.230  0.095
## 5            | -0.537  0.344  0.414 | -0.040  0.002  0.002 | -0.186  0.062
## 6            | -0.537  0.344  0.414 | -0.040  0.002  0.002 | -0.186  0.062
## 7            | -0.295  0.104  0.140 | -0.148  0.031  0.035 | -0.284  0.145
## 8            | -0.111  0.015  0.009 | -0.183  0.048  0.023 | -0.720  0.933
## 9            |  0.567  0.383  0.199 | -0.472  0.319  0.138 |  0.093  0.015
## 10           |  0.907  0.980  0.384 | -0.147  0.031  0.010 | -0.346  0.215
##                cos2  
## 1             0.054 |
## 2             0.255 |
## 3             0.129 |
## 4             0.074 |
## 5             0.050 |
## 6             0.050 |
## 7             0.129 |
## 8             0.359 |
## 9             0.005 |
## 10            0.056 |
## 
## Categories (the 10 first)
##                  Dim.1     ctr    cos2  v.test     Dim.2     ctr    cos2
## Not.tea time |  -0.505   5.684   0.198  -7.690 |   0.205   1.121   0.032
## tea time     |   0.392   4.406   0.198   7.690 |  -0.159   0.869   0.032
## Not.tearoom  |  -0.327   4.392   0.445 -11.538 |   0.046   0.104   0.009
## tearoom      |   1.363  18.324   0.445  11.538 |  -0.191   0.434   0.009
## black        |   0.462   2.684   0.070   4.570 |   0.178   0.479   0.010
## Earl Grey    |  -0.114   0.425   0.023  -2.643 |  -0.248   2.423   0.111
## green        |  -0.370   0.769   0.017  -2.250 |   1.050   7.443   0.136
## alone        |  -0.157   0.819   0.046  -3.704 |   0.146   0.849   0.040
## lemon        |   0.516   1.492   0.033   3.134 |   0.259   0.453   0.008
## milk         |  -0.049   0.026   0.001  -0.439 |  -0.401   2.069   0.043
##               v.test     Dim.3     ctr    cos2  v.test  
## Not.tea time   3.114 |   0.167   0.934   0.021   2.535 |
## tea time      -3.114 |  -0.129   0.724   0.021  -2.535 |
## Not.tearoom    1.620 |   0.016   0.016   0.001   0.572 |
## tearoom       -1.620 |  -0.068   0.068   0.001  -0.572 |
## black          1.760 |  -0.986  18.487   0.318  -9.753 |
## Earl Grey     -5.754 |   0.437   9.462   0.344  10.140 |
## green          6.383 |  -0.343   1.000   0.015  -2.087 |
## alone          3.438 |  -0.185   1.723   0.064  -4.368 |
## lemon          1.575 |   1.622  22.318   0.325   9.859 |
## milk          -3.572 |  -0.076   0.092   0.002  -0.674 |
## 
## Categorical variables (eta2)
##                Dim.1 Dim.2 Dim.3  
## tea.time     | 0.198 0.032 0.021 |
## tearoom      | 0.445 0.009 0.001 |
## Tea          | 0.076 0.169 0.375 |
## How          | 0.150 0.106 0.372 |
## sugar        | 0.070 0.012 0.393 |
## how          | 0.462 0.610 0.028 |
## where        | 0.560 0.692 0.106 |

Here we can see that the dimensions 1 and 2 cover 29,915% of the variance. From the last table we can see how well different variables correlate with the first three dimensions.

plot(mca, invisible=c("ind"), habillage = "quali")

From the plot above we can see for example that buying unpackaged tea is closely related to buying tea from a tea shop. Also buying only tea bags is closely related to buying tea from a chain store. Drinking earl grey and drinking tea with lemon are closely related and also drinking black tea and drinking tea with milk.